AITopics | soap note

Collaborating Authors

soap note

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Skin-SOAP: A Weakly Supervised Framework for Generating Structured SOAP Notes

Kamal, Sadia, Oates, Tim, Wan, Joy

arXiv.org Artificial IntelligenceAug-8-2025

Skin carcinoma is the most prevalent form of cancer globally, accounting for over $8 billion in annual healthcare expenditures. Early diagnosis, accurate and timely treatment are critical to improving patient survival rates. In clinical settings, physicians document patient visits using detailed SOAP (Subjective, Objective, Assessment, and Plan) notes. However, manually generating these notes is labor-intensive and contributes to clinician burnout. In this work, we propose skin-SOAP, a weakly supervised multimodal framework to generate clinically structured SOAP notes from limited inputs, including lesion images and sparse clinical text. Our approach reduces reliance on manual annotations, enabling scalable, clinically grounded documentation while alleviating clinician burden and reducing the need for large annotated data. Our method achieves performance comparable to GPT-4o, Claude, and DeepSeek Janus Pro across key clinical relevance metrics. To evaluate this clinical relevance, we introduce two novel metrics MedConceptEval and Clinical Coherence Score (CCS) which assess semantic alignment with expert medical concepts and input features, respectively.

large language model, machine learning, soap note, (19 more...)

arXiv.org Artificial Intelligence

2508.05019

Country:

North America > United States > Texas (0.14)
North America > United States > Maryland > Baltimore (0.14)
North America > United States > Maryland > Baltimore County (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Dermatology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Skin Cancer (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Toward the Autonomous AI Doctor: Quantitative Benchmarking of an Autonomous Agentic AI Versus Board-Certified Clinicians in a Real World Setting

Hayat, Hashim, Kudrautsau, Maksim, Makarov, Evgeniy, Melnichenko, Vlad, Tsykunou, Tim, Varaksin, Piotr, Pavelle, Matt, Oskowitz, Adam Z.

arXiv.org Artificial IntelligenceAug-1-2025

The CSS was accompanied by a natural language explanation of the scores. The LLM judge role used GPT-4.0 by OpenAI. Evaluation by Human Experts Each encounter pair in which the top diagnosis of AI and clinician did not match was evaluated by a board-certified physician with access to medical reference material. Blinding the physician to the origin of the documentation proved impractical, as the AI-based notes were highly consistent and thus easily recognized within a few pairs. The physician was asked to determine the cause of the disagreement between the documents, whether AI or the physician was more likely to be correct, whether it was not possible to determine which diagnosis was more appropriate, and whether the diagnoses did, in fact, match. Similarity and Style Metrics To evaluate how similar-or different the AI-generated (Doctronic) and clinician-generated SOAP notes were, we followed a two-step process. First, we assessed surface-level textual similarity using three standard statistical metrics: (1) TF IDF cosine similarity, which transforms each note into a weighted term-frequency vector and measures the cosine of the angle between them to capture word-frequency alignment; (2) the Jaccard index, which is the ratio of the intersection to the union of lowercased token sets, ranging from 0 (no overlap) to 1 (identical token sets); and (3) the Levenshtein ratio, a normalized edit-distance score based on character-level insertions, deletions, and substitutions that quantifies textual similarity on a 0-1 scale. These analyses demonstrated only minimal alignment in phrasing, formatting, and vocabulary. Then, to probe contextual and semantic similarity, we generated embeddings for each note using OpenAI's text embedding 3 small model and two versions of Biobert,

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2507.22902

Country: North America > United States > California > San Francisco County > San Francisco (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Rheumatology (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
(16 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.44)

Add feedback

A Custom-Built Ambient Scribe Reduces Cognitive Load and Documentation Burden for Telehealth Clinicians

Morse, Justin, Gilbert, Kurt, Shin, Kyle, Cooke, Rick, Rose, Peyton, Sullivan, Jack, Sisante, Angelo

arXiv.org Artificial IntelligenceJul-25-2025

Clinician burnout has motivated the growing adoption of ambient medical scribes in the clinic. In this work, we introduce a custom-built ambient scribe application integrated into the EHR system at Included Health, a personalized all-in-one healthcare company offering telehealth services. The application uses Whisper for transcription and a modular in-context learning pipeline with GPT-4o to automatically generate SOAP notes and patient instructions. Testing on mock visit data shows that the notes generated by the application exceed the quality of expert-written notes as determined by an LLM-as-a-judge. The application has been widely adopted by the clinical practice, with over 540 clinicians at Included Health using the application at least once. 94% (n = 63) of surveyed clinicians report reduced cognitive load during visits and 97% (n = 66) report less documentation burden when using the application. Additionally, we show that post-processing notes with a fine-tuned BART model improves conciseness. These findings highlight the potential for AI systems to ease administrative burdens and support clinicians in delivering efficient, high-quality care.

large language model, machine learning, soap note, (19 more...)

arXiv.org Artificial Intelligence

2507.17754

Genre: Research Report (1.00)

Industry: Health & Medicine > Health Care Technology > Telehealth (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards physician-centered oversight of conversational diagnostic AI

Vedadi, Elahe, Barrett, David, Harris, Natalie, Wulczyn, Ellery, Reddy, Shashir, Ruparel, Roma, Schaekermann, Mike, Strother, Tim, Tanno, Ryutaro, Sharma, Yash, Lee, Jihyeon, Hughes, Cían, Slack, Dylan, Palepu, Anil, Freyberg, Jan, Saab, Khaled, Liévin, Valentin, Weng, Wei-Hung, Tu, Tao, Liu, Yun, Tomasev, Nenad, Kulkarni, Kavita, Mahdavi, S. Sara, Guu, Kelvin, Barral, Joëlle, Webster, Dale R., Manyika, James, Hassidim, Avinatan, Chou, Katherine, Matias, Yossi, Kohli, Pushmeet, Rodman, Adam, Natarajan, Vivek, Karthikesalingam, Alan, Stutz, David

arXiv.org Artificial IntelligenceJul-22-2025

Recent work has demonstrated the promise of conversational AI systems for diagnostic dialogue. However, real-world assurance of patient safety means that providing individual diagnoses and treatment plans is considered a regulated activity by licensed professionals. Furthermore, physicians commonly oversee other team members in such activities, including nurse practitioners (NPs) or physician assistants/associates (PAs). Inspired by this, we propose a framework for effective, asynchronous oversight of the Articulate Medical Intelligence Explorer (AMIE) AI system. We propose guardrailed-AMIE (g-AMIE), a multi-agent system that performs history taking within guardrails, abstaining from individualized medical advice. Afterwards, g-AMIE conveys assessments to an overseeing primary care physician (PCP) in a clinician cockpit interface. The PCP provides oversight and retains accountability of the clinical decision. This effectively decouples oversight from intake and can thus happen asynchronously. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) of text consultations with asynchronous oversight, we compared g-AMIE to NPs/PAs or a group of PCPs under the same guardrails. Across 60 scenarios, g-AMIE outperformed both groups in performing high-quality intake, summarizing cases, and proposing diagnoses and management plans for the overseeing PCP to review. This resulted in higher quality composite decisions. PCP oversight of g-AMIE was also more time-efficient than standalone PCP consultations in prior work. While our study does not replicate existing clinical practices and likely underestimates clinicians' capabilities, our results demonstrate the promise of asynchronous oversight as a feasible paradigm for diagnostic AI systems to operate under expert human oversight for enhancing real-world care.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2507.15743

Country:

North America > United States (0.46)
Europe > United Kingdom (0.45)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.92)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.67)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Towards Scalable SOAP Note Generation: A Weakly Supervised Multimodal Framework

Kamal, Sadia, Oates, Tim, Wan, Joy

arXiv.org Artificial IntelligenceJun-13-2025

Skin carcinoma is the most prevalent form of cancer globally, accounting for over $8 billion in annual healthcare expenditures. In clinical settings, physicians document patient visits using detailed SOAP (Subjective, Objective, Assessment, and Plan) notes. However, manually generating these notes is labor-intensive and contributes to clinician burnout. In this work, we propose a weakly supervised multimodal framework to generate clinically structured SOAP notes from limited inputs, including lesion images and sparse clinical text. Our approach reduces reliance on manual annotations, enabling scalable, clinically grounded documentation while alleviating clinician burden and reducing the need for large annotated data. Our method achieves performance comparable to GPT-4o, Claude, and DeepSeek Janus Pro across key clinical relevance metrics. To evaluate clinical quality, we introduce two novel metrics MedConceptEval and Clinical Coherence Score (CCS) which assess semantic alignment with expert medical concepts and input features, respectively.

large language model, machine learning, soap note, (17 more...)

arXiv.org Artificial Intelligence

2506.10328

Country: North America > United States > Maryland (0.46)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Dermatology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Skin Cancer (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Do Physicians Know How to Prompt? The Need for Automatic Prompt Optimization Help in Clinical Note Generation

Yao, Zonghai, Jaafar, Ahmed, Wang, Beining, Zhu, Yue, Yang, Zhichao, Yu, Hong

arXiv.org Artificial IntelligenceNov-16-2023

This study examines the effect of prompt engineering on the performance of Large Language Models (LLMs) in clinical note generation. We introduce an Automatic Prompt Optimization (APO) framework to refine initial prompts and compare the outputs of medical experts, non-medical experts, and APO-enhanced GPT3.5 and GPT4. Results highlight GPT4 APO's superior performance in standardizing prompt quality across clinical note sections. A human-in-the-loop approach shows that experts maintain content quality post-APO, with a preference for their own modifications, suggesting the value of expert customization. We recommend a two-phase optimization process, leveraging APO-GPT4 for consistency and expert input for personalization.

arxiv preprint arxiv, instruction, preprint arxiv, (13 more...)

arXiv.org Artificial Intelligence

2311.09684

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > China > Ningxia Hui Autonomous Region > Yinchuan (0.04)

Genre: Research Report > Experimental Study (0.66)

Industry: Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Extracting Structured Data from Physician-Patient Conversations By Predicting Noteworthy Utterances

Krishna, Kundan, Pavel, Amy, Schloss, Benjamin, Bigham, Jeffrey P., Lipton, Zachary C.

arXiv.org Artificial IntelligenceJul-14-2020

Despite diverse efforts to mine various modalities of medical data, the conversations between physicians and patients at the time of care remain an untapped source of insights. In this paper, we leverage this data to extract structured information that might assist physicians with post-visit documentation in electronic health records, potentially lightening the clerical burden. In this exploratory study, we describe a new dataset consisting of conversation transcripts, post-visit summaries, corresponding supporting evidence (in the transcript), and structured labels. We focus on the tasks of recognizing relevant diagnoses and abnormalities in the review of organ systems (RoS). One methodological challenge is that the conversations are long (around 1500 words), making it difficult for modern deep-learning models to use them as input. To address this challenge, we extract noteworthy utterances---parts of the conversation likely to be cited as evidence supporting some summary sentence. We find that by first filtering for (predicted) noteworthy utterances, we can significantly boost predictive performance for recognizing both diagnoses and RoS abnormalities.

machine learning, natural language, prediction, (19 more...)

arXiv.org Artificial Intelligence

2007.07151

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.65)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generating SOAP Notes from Doctor-Patient Conversations

Krishna, Kundan, Khosla, Sopan, Bigham, Jeffrey P., Lipton, Zachary C.

arXiv.org Artificial IntelligenceMay-4-2020

Following each patient visit, physicians must draft detailed clinical summaries called SOAP notes. Moreover, with electronic health records, these notes must be digitized. For all the benefits of this documentation the process remains onerous, contributing to increasing physician burnout. In a parallel development, patients increasingly record audio from their visits (with consent), often through dedicated apps. In this paper, we present the first study to evaluate complete pipelines for leveraging these transcripts to train machine learning model to generate these notes. We first describe a unique dataset of patient visit records, consisting of transcripts, paired SOAP notes, and annotations marking noteworthy utterances that support each summary sentence. We decompose the problem into extractive and abstractive subtasks, exploring a spectrum of approaches according to how much they demand from each component. Our best performing method first (i) extracts noteworthy utterances via multi-label classification assigns them to summary section(s); (ii) clusters noteworthy utterances on a per-section basis; and (iii) generates the summary sentences by conditioning on the corresponding cluster and the subsection of the SOAP sentence to be generated. Compared to an end-to-end approach that generates the full SOAP note from the full conversation, our approach improves by 7 ROUGE-1 points. Oracle experiments indicate that fixing our generative capabilities, improvements in extraction alone could provide (up to) a further 9 ROUGE point gain.

machine learning, natural language, utterance, (19 more...)

arXiv.org Artificial Intelligence

2005.01795

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology (0.86)
Health & Medicine > Diagnostic Medicine (0.70)
Health & Medicine > Consumer Health (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)

Add feedback